Biosynthetic systems producing fungal indole alkaloids

ABSTRACT

The biosynthesis of fungal bicyclo[2.2.2]diazaoctane indole alkaloids with a wide spectrum of biological activities have attracted increasing interest. Their intriguing mode of assembly has long been proposed to feature a non-ribosomal peptide synthetase, a presumed intramolecular Diels-Alderase, a variant number of prenyltransferases, and a series of oxidases responsible for the diverse tailoring modifications of their cyclodipeptide-based structural core. Until recently, the details of these biosynthetic pathways have remained largely unknown due to lack of information on the fungal derived biosynthetic gene clusters. Herein, we report a comparative analysis of four natural product metabolic systems of a select group of bicyclo[2.2.2]diazaoctane indole alkaloids including (+)/(−)-notoamide, paraherquamide and malbrancheamide, in which we propose an enzyme for each step in the biosynthetic pathway based on deep annotation and on-going biochemical studies.

PRIORITY CLAIM

This application claims priority benefit of U.S. Provisional Patent Application No. 61/620,176, filed Apr. 4, 2012, and U.S. Provisional Application No. 61/622,265, filed Apr. 10, 2012, the disclosures of which are incorporated in their entireties herein.

STATEMENT OF GOVERNMENT INTEREST

This invention was made with government support under Grant Number R01 CA070375. Awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.

INCORPORATION BY REFERENCE

This application contains, as a separate part of disclosure, a Sequence Listing in computer-readable form (filename: 46904PCT_SeqListing.txt; created Mar. 27, 2013; 446,686 bytes—ASCII text file) which is incorporated herein by reference in its entirety.

BACKGROUND

Natural products continue to be a rich source of clinical drugs for treatment of human and animal diseases.^(1,2) With respect to drug development, advanced understanding of their biosynthesis is significant for rational strain improvement efforts. This includes genetic manipulation (e.g. gene knock-out, knock-in, and whole gene cluster amplification) of the key biosynthetic and regulatory genes in order to increase the yield of pharmaceuticals to a desired level.³⁻⁶ Knowledge on biosynthesis is also valuable for guiding generation of novel natural product analogs as new drug candidates by metabolic engineering, mutasynthesis and allied approaches.⁷⁻¹¹ In addition, biochemical characterization of diverse biosynthetic enzymes continues to reveal new catalytic mechanisms that inspire inventions of novel chemical and biological catalysts in organic chemistry for production of fine-chemical and medicinal agents.^(12,13)

Elucidation of the biosynthetic pathway of a particular natural product or a family of natural products first requires identification of the gene cluster encoding its production.¹⁴⁻¹⁶ Next, the combined genetic (in vivo) and biochemical characterization (in vitro) of each individual biosynthetic enzyme provides important information, including enzyme substrate specificity, co-factor requirements, and the precise order of multiple biosynthetic steps.^(17,18) With this information available, it becomes possible to reconstitute the entire biosynthetic pathway in a heterologous host¹⁹⁻²¹ or in a multi-component in vitro reaction.^(22,23)

Across all microbes, plants and animals that generate natural products, it is particularly challenging to elucidate a biosynthetic pathway completely when unprecedented steps are involved, or precedent knowledge of biosynthetic origin is limited or non-existent. Conventionally, the hunting for such enzymes catalyzing these unusual biotransformations via unexplored mechanisms depends on implementing reasonable biosynthetic principles, and the scanning of the activity of all possible candidate enzymes against all hypothetical substrates.^(18,24,25) Thus, the entire process can require prolonged and intensive efforts, especially for those complex natural products assembled by a large number of biosynthetic enzymes.

Due to the discovery of natural products from different microorganisms bearing the same unique structural core, but varying from one another in their tailoring groups, opportunities for facile identification of unique enzymes arise. In this scenario comparative bioinformatic analysis suggests that homologous genes can be linked to formation of a common structural core, whereas cluster-specific genes provide the basis for structural differences.²⁶⁻²⁹ Recent advances in whole genome sequencing technology have made this approach rapid and cost-effective.³⁰⁻³⁴ Thus, identification of biosynthetic gene clusters for structurally related natural products from different microorganisms has become practical for comparative analysis of these systems. Deep annotation provides adequate information to develop hypotheses regarding key gene(s) and their protein products. This in turn guides experimental strategies to explore unusual biotransformation(s) of interest using genetic and/or biochemical approaches. Although considerable information can be gleaned from biosynthetic pathway mining and annotation, putative biochemical function can only be verified by analysis of the gene product in vitro using natural or suitable model substrates.

DESCRIPTION OF THE DRAWINGS

FIG. 1 (A) Structures of (±)-notoamide A ((±)-1), paraherquamide A (2), and malbrancheamide (3). The unique structural features in 2 and 3 compared to 1 are highlighted in dashed boxes; (B) Proposed formation of the antipodal bicyclo[2.2.2]diazaoctane ring systems.

FIG. 2—The (−)-notoamide A (not), (+)-notoamide A (not′), paraherquamide (phq), and malbrancheamide (mal) biosynthetic gene clusters identified from genome sequencing and bioinformatic mining of Aspergillus sp. MF297-2, Aspergillus versicolor NRRL35600, P. fellutanum ATCC20841, and M. aurantiaca RRC1813, respectively. Homology of open reading frames across gene clusters is shown by same colored arrows. The not and not′ genes in the red box are unlikely involved in notoamide biosynthesis.

FIG. 3—Proposed biosynthetic pathway for antipodal notoamide metabolites.

FIG. 4—Proposed biosynthetic pathway for paraherquamide A.

FIG. 5—Proposed biosynthetic pathway for malbrancheamide natural products.

FIG. 6—Summary of divergent NRPS strategies that culminate in the formation of structurally related bicyclo[2.2.2]diazaoctane ring systems in distinct oxidation states.

FIGS. 7A-7C—Sequence Table showing correlation between sequence identification numbers and specific open reading frames.

SUMMARY OF THE INVENTION

The disclosure provides a host cell that produces a prenylated indole alkaloid.

The disclosure provides a host cell transformed with one or more polynucleotides selected from the group consisting of: a polynucleotide encoding SEQ ID NO: 3 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 3 having MalA activity; a polynucleotide encoding SEQ ID NO: 5 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 5 having MalB activity; a polynucleotide encoding SEQ ID NO: 7 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 7 having MalC activity; a polynucleotide encoding SEQ ID NO: 9 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 9 having MalD activity; a polynucleotide encoding SEQ ID NO: 11 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 11 having MalE activity; a polynucleotide encoding SEQ ID NO: 13 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 13 having MalF activity, and a polynucleotide encoding SEQ ID NO: 15 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 15 having MalG activity.

The disclosure further provides a host cell transformed with one or more polynucleotides selected from the group consisting of: a polynucleotide encoding SEQ ID NO: 18 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 18 having NotA activity; a polynucleotide encoding SEQ ID NO: 20 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 20 having NotB activity; a polynucleotide encoding SEQ ID NO: 22 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 22 having NotC activity; a polynucleotide encoding SEQ ID NO: 24 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 24 having NotD activity; a polynucleotide encoding SEQ ID NO: 26 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 26 having NotE activity; a polynucleotide encoding SEQ ID NO: 28 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 28 having NotF activity; a polynucleotide encoding SEQ ID NO: 30 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 30 having NotG activity; a polynucleotide encoding SEQ ID NO: 32 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 32 having NotH activity; a polynucleotide encoding SEQ ID NO: 34 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 34 having NotI activity; a polynucleotide encoding SEQ ID NO: 36 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 36 having NotJ activity; a polynucleotide encoding SEQ ID NO: 38 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 38 having NotK activity; a polynucleotide encoding SEQ ID NO: 40 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 40 having NotL activity; a polynucleotide encoding SEQ ID NO: 42 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 42 having NotM activity; a polynucleotide encoding SEQ ID NO: 44 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 44 having NotN activity; a polynucleotide encoding SEQ ID NO: 46 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 46 having NotO activity; a polynucleotide encoding SEQ ID NO: 48 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 48 having NotP activity; a polynucleotide encoding SEQ ID NO: 50 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 50 having NotQ activity, and a polynucleotide encoding SEQ ID NO: 52 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 52 having NotR activity.

The disclosure further provides a host cell transformed with one or more polynucleotides selected from the group consisting of: a polynucleotide encoding SEQ ID NO: 55 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 55 having phqA activity; a polynucleotide encoding SEQ ID NO: 57 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 57 having phqB activity; a polynucleotide encoding SEQ ID NO: 59 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 59 having phqC activity; a polynucleotide encoding SEQ ID NO: 61 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 61 having phqD activity; a polynucleotide encoding SEQ ID NO: 63 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 63 having phqE activity; a polynucleotide encoding SEQ ID NO: 65 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 65 having phqF activity; a polynucleotide encoding SEQ ID NO: 67 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 67 having phqG activity; a polynucleotide encoding SEQ ID NO: 69 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 69 having phD2 activity; a polynucleotide encoding SEQ ID NO: 71 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 71 having phqI activity; a polynucleotide encoding SEQ ID NO: 73 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 73 having phqJ activity; a polynucleotide encoding SEQ ID NO: 75 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 75 having phqK activity; a polynucleotide encoding SEQ ID NO: 77 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 77 having phqL activity; a polynucleotide encoding SEQ ID NO: 79 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 79 having phqM activity; a polynucleotide encoding SEQ ID NO: 81 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 81 having phqN activity, and a polynucleotide encoding SEQ ID NO: 83 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 83 having phqO activity.

The disclosure also provides a host cell transformed with one or more polynucleotides selected from the group consisting of: a polynucleotide encoding SEQ ID NO: 3 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 3 having MalA activity, a polynucleotide encoding SEQ ID NO: 5 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 5 having MalB activity; a polynucleotide encoding SEQ ID NO: 7 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 7 having MalC activity; a polynucleotide encoding SEQ ID NO: 9 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ D NO: 9 having MalD activity; a polynucleotide encoding SEQ ID NO: 11 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 11 having MalE activity; a polynucleotide encoding SEQ ID NO: 13 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 13 having MalF activity; a polynucleotide encoding SEQ ID NO: 15 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 15 having MalG activity; a polynucleotide encoding SEQ ID NO: 18 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 18 having NotA activity; a polynucleotide encoding SEQ ID NO: 20 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 20 having NotB activity; a polynucleotide encoding SEQ ID NO: 22 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 22 having NotC activity; a polynucleotide encoding SEQ ID NO: 24 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 24 having NotD activity; a polynucleotide encoding SEQ ID NO: 26 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 26 having NotE activity; a polynucleotide encoding SEQ ID NO: 28 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 28 having NotF activity; a polynucleotide encoding SEQ ID NO: 30 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 30 having NotG activity; a polynucleotide encoding SEQ ID NO: 32 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 32 having NotH activity; a polynucleotide encoding SEQ ID NO: 34 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 34 having NotI activity; a polynucleotide encoding SEQ ID NO: 36 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 36 having NotJ activity; a polynucleotide encoding SEQ ID NO: 38 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 38 having NotK activity; a polynucleotide encoding SEQ ID NO: 40 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 40 having NotL activity; a polynucleotide encoding SEQ ID NO: 42 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 42 having NotM activity; a polynucleotide encoding SEQ ID NO: 44 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 44 having NotN activity; a polynucleotide encoding SEQ ID NO: 46 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 46 having NotO activity; a polynucleotide encoding SEQ ID NO: 48 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 48 having NotP activity; a polynucleotide encoding SEQ ID NO: 50 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 50 having NotQ activity; a polynucleotide encoding SEQ ID NO: 52 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 52 having NotR activity; a polynucleotide encoding SEQ ID NO: 55 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 55 having phqA activity; a polynucleotide encoding SEQ ID NO: 57 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 57 having phqB activity; a polynucleotide encoding SEQ ID NO: 59 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 59 having phqC activity; a polynucleotide encoding SEQ ID NO: 61 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 61 having phqD activity; a polynucleotide encoding SEQ ID NO: 63 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 63 having phqE activity; a polynucleotide encoding SEQ ID NO: 65 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 65 having phqF activity; a polynucleotide encoding SEQ ID NO: 67 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 67 having phqG activity; a polynucleotide encoding SEQ ID NO: 69 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 69 having phD2 activity; a polynucleotide encoding SEQ ID NO: 71 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 71 having phqI activity; a polynucleotide encoding SEQ ID NO: 73 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 73 having phqJ activity; a polynucleotide encoding SEQ ID NO: 75 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 75 having phqK activity; a polynucleotide encoding SEQ ID NO: 77 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 77 having phqL activity; a polynucleotide encoding SEQ ID NO: 79 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 79 having phqM activity; a polynucleotide encoding SEQ ID NO: 81 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 81 having phqN activity, and a polynucleotide encoding SEQ ID NO: 83 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 83 having phqO activity.

The disclosure also provides a MalA protein having the amino acid sequence set out in SEQ ID NO: 3 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 3 having MalA activity.

The disclosure also provides a MalB protein having the amino acid sequence set out in SEQ ID NO: 5 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 5 having EtuA2 activity.

The disclosure also provides a MalC protein having the amino acid sequence set out in SEQ ID NO: 7 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 7 having MalC activity.

The disclosure also provides a MalD protein having the amino acid sequence set out in SEQ ID NO: 9 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 9 having MalD activity.

The disclosure also provides a MalE protein having the amino acid sequence set out in SEQ ID NO: 11 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 11 having MalE activity.

The disclosure also provides a MalF protein having the amino acid sequence set out in SEQ ID NO: 13 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 13 having MalF activity.

The disclosure also provides a MalG protein having the amino acid sequence set out in SEQ ID NO: 15 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 15 having MalG activity.

The disclosure also provides a NoA protein having the amino acid sequence set out in SEQ ID NO: 18 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 18 having NotA activity.

The disclosure also provides a NotB protein having the amino acid sequence set out in SEQ ID NO: 20 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 20 having NotB activity.

The disclosure also provides a NotC protein having the amino acid sequence set out in SEQ ID NO: 22 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 22 having NotC activity.

The disclosure also provides a NotD protein having the amino acid sequence set out in SEQ ID NO: 24 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 24 having NotD activity.

The disclosure also provides a NotE protein having the amino acid sequence set out in SEQ ID NO: 26 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 26 having NotE activity.

The disclosure also provides a NotF protein having the amino acid sequence set out in SEQ ID NO: 28 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 28 having NotF activity.

The disclosure also provides a NotG protein having the amino acid sequence set out in SEQ ID NO: 30 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 30 having NotG activity.

The disclosure also provides a NotH protein having the amino acid sequence set out in SEQ ID NO: 32 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 32 having NotH activity.

The disclosure also provides a NotI protein having the amino acid sequence set out in SEQ ID NO: 34 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 34 having NotI activity.

The disclosure also provides a NotJ protein having the amino acid sequence set out in SEQ ID NO: 36 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 36 having NotJ activity

The disclosure also provides a NotK protein having the amino acid sequence set out in SEQ ID NO: 38 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 38 having NotK activity

The disclosure also provides a NotL protein having the amino acid sequence set out in SEQ ID NO: 40 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 40 having NotL activity.

The disclosure also provides a NotM protein having the amino acid sequence set out in SEQ ID NO: 42 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 42 having NotM activity.

The disclosure also provides a NotN protein having the amino acid sequence set out in SEQ ID NO: 44 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 44 having NotN activity.

The disclosure also provides a NotO protein having the amino acid sequence set out in SEQ ID NO: 46 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 46 having EtuT activity.

The disclosure also provides a NotP protein having the amino acid sequence set out in SEQ ID NO: 48 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 48 having NotP activity.

The disclosure also provides a NotQ protein having the amino acid sequence set out in SEQ ID NO: 50 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 50 having NotQ activity.

The disclosure also provides a NotR protein having the amino acid sequence set out in SEQ ID NO: 52 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 52 having NotR activity.

The disclosure also provides a phqA protein having the amino acid sequence set out in SEQ ID NO: 55 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 24 having phqA activity.

The disclosure also provides a phqB protein having the amino acid sequence set out in SEQ ID NO: 57 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more dentical to SEQ ID NO: 57 having phqB activity.

The disclosure also provides a phqC protein having the amino acid sequence set out in SEQ ID NO: 59 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 59 having phqC activity.

The disclosure also provides a phqD protein having the amino acid sequence set out in SEQ ID NO: 61 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 61 having phqD activity.

The disclosure also provides a phqE protein having the amino acid sequence set out in SEQ ID NO: 63 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 63 having phqE activity.

The disclosure also provides a phqF protein having the amino acid sequence set out in SEQ ID NO: 65 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 65 having phqF activity.

The disclosure also provides a phqG protein having the amino acid sequence set out in SEQ ID NO: 67 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 67 having phqH activity

The disclosure also provides a phqH protein having the amino acid sequence set out in SEQ ID NO: 69 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 69 having phqH activity

The disclosure also provides a phqI protein having the amino acid sequence set out in SEQ ID NO: 71 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 71 having phqI activity.

The disclosure also provides a phqJ protein having the amino acid sequence set out in SEQ ID NO: 73 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 73 having phqJ activity.

The disclosure also provides a phqK protein having the amino acid sequence set out in SEQ ID NO: 75 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 75 having phqK activity.

The disclosure also provides a phqL protein having the amino acid sequence set out in SEQ ID NO: 77 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 77 having phqL activity.

The disclosure also provides a phqM protein having the amino acid sequence set out in SEQ ID NO: 79 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 79 having phqM activity.

The disclosure also provides a phqN protein having the amino acid sequence set out in SEQ ID NO: 81 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 81 having phqN activity.

The disclosure also provides a phqO protein having the amino acid sequence set out in SEQ ID NO: 83 or a protein 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more identical to SEQ ID NO: 83 having phqO activity.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 2 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 4 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 6 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 8 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 10 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 12 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 14 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 17 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 19 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 21 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 23 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 25 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 27 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 29 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 31 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO:33 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO:35 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO:37 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 39 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 41 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 43 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 45 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 47 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 49 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 51 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 54 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 56 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 58 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 60 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 62 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 64 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 66 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO:68 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 70 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 72 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 74 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 76 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 78 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 80 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide set out in SEQ ID NO: 82 or a polynucleotide 98% or more, 97% or more, 96% or more, 95% or more, 90% or more, 85% or more, 80% or more, or 75% or more homologous thereto.

The disclosure also provides a polynucleotide encoding a protein of any one of the polynucleotides of the disclosure.

The disclosure also provides an expression vector comprising a polynucleotide of the disclosure.

The disclosure also provides a host cell transformed with an expression vector of the disclosure or a polynucleotide of the disclosure.

The disclosure also provides a method for producing prenylated indole alkaloid or a metabolic intermediate for producing a prenylated indole alkaloid comprising the step of growing a host cell of the disclosure under conditions to express the protein encoded by the transformed polynucleotide and producing a prenylated indole alkaloid or the metabolic intermediate for producing a prenylated indole alkaloid. In various aspects, the method further comprises the step of isolating the prenylated indole alkaloid or the metabolic intermediate of the prenylated indole alkaloid. In various aspects, the host cell is a prokaryote. In various aspects, the host cell is selected from the group consisting of E. coli, Streptomyces lavendulae, Myxococcus xanthus, and Pseudomonas fluorescens.

DESCRIPTION OF THE INVENTION

“Sequence identity” means that two amino acid or polynucleotide sequences are identical over a region of comparison, such as a region of at least about 250 residues or bases. Optionally, the region of identity spans at least about 100-500 residues or bases, and spans the active domain of the polypeptide. Several methods of conducting sequence alignment are known in the art and include, for example, the homology alignment algorithm (Needleman & Wunsch, J. Mol. Biol., 48, 443 (1970)); the local homology algorithm (Smith & Waterman, Adv. Appl. Math., 2, 482 (1981)); and the search for similarity method (Pearson & Lipman, Proc. Natl. Acad. Sci. USA, 85, 2444 (1988)). Preferably, the algorithm used to determine percent sequence identity and sequence similarity is the BLAST algorithm (Altschul et al., J. Mol. Biol., 215, 403-410 (1990); Henikoff & Henikoff. Proc. Natl. Acad. Sci. USA, 89, 10915 (1989); Karlin & Altschul, Proc. Natl. Acad. Sci. USA, 90, 5873-5787 (1993)). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. Other examples of alignment software, including GAP, BESTFIT, FASTA, PILEUP, and TFASTA provided by Wisconsin Genetics Software Package (Genetics Computer Group, 575 Science Dr., Madison, Wis.), and CLUSTALW (Thompson et al., Nuc. Acids Res., 22, 4673-4680 (1994); are known in the art. The degree of homology (percent identity) between a native and a mutant sequence may be determined, for example, by comparing the two sequences using computer programs commonly employed for this purpose. Briefly, the GAP program defines identity as the number of aligned symbols (i.e., nucleotides or amino acids) which are identical, divided by the total number of symbols in the shorter of the two sequences. The default parameters for the GAP program include: (1) a unary comparison matrix (containing a value of 1 for identities and 0 for non-identities) for nucleotides, and the weighted comparison matrix of Gribskov and Burgess, Nucl. Acids Res. 14:6745, 1986, as described by Schwartz and Dayhoff, eds., Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, pp. 353-358, 1979; (2) a penalty of 3.0 for each gap and an additional 0.10 penalty for each symbol in each gap; and (3) no penalty for end gaps.

Alterations of the native amino acid sequence may be accomplished by any of a number of known techniques. Mutations can be introduced at particular loci by synthesizing oligonucleotides containing a mutant sequence, flanked by restriction sites enabling ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes an analog having the desired amino acid insertion, substitution, or deletion.

Alternatively, oligonucleotide-directed site-specific mutagenesis procedures can be employed to provide an altered gene having particular codons altered according to the substitution, deletion, or insertion required. Techniques for making such alterations include those disclosed by Walder et al. (Gene 42:133, 1986); Bauer et al. (Gene 37:73, 1985); Craik (BioTechniques, January 1985, 12-19); Smith et al. (Genetic Engineering: Principles and Methods, Plenum Press, 1981); and U.S. Pat. Nos. 4,518,584 and 4,737,462, which are incorporated by reference herein.

The disclosure provides an example of the comparative analysis of biosynthetic gene clusters (mined from the whole genome) and pathways for structurally related fungal indole alkaloids bearing the unusual bicyclo[2.2.2]diazaoctane core, including the anticancer agents (−)-notoamide A ((−)-1) and (+)-notoamide A ((+)-1),^(35,36) the anthelmintic paraherquamide A (2),³⁷⁻³⁹ and the calmodulin-inhibitor malbrancheamide⁴⁰⁻⁴² (3) (FIG. 1A) produced by Aspergillus sp. MF297-2,⁴³ Aspergillus versicolor NRRL35600, Penicillium fellutanum ATCC20841, and Malbranchea aurantiaca RRC1813, respectively. These fungal natural products are assembled from an L-tryptophan, a second cyclic amino acid residue, and one or two isoprene units through biosynthetic pathways that are proposed to feature an intriguing intramolecular Diels Alderase (IMDAse), and a number of unique enantiomerically selective enzymes.⁴⁴⁻⁴⁹ The diverse bioactivities of this natural product family suggests that elucidation of their biosynthesis could direct future structural diversification via biosynthetic engineering, thereby leading to enhanced biological activities.

This comparative analysis provides significant insights into a number of intriguing biosynthetic questions: (1) which enzyme in each pathway is likely responsible for the formation of the bicyclo[2.2.2]diazaoctane core via the proposed intramolecular [4+2] Diels-Alder (IMDA) cyclization; (2) which enzyme in the pathway of 1 and 2 installs the spiro-oxindole functionality via a putative epoxide-initiated Pinacol-type rearrangement; and (3) what genetic difference controls formation of the dioxopiperazine in 1 versus the monooxopiperazine in 2 and 3.

The most significant structural similarity between 1-3 is the bicyclo[2.2.2]diazaoctane core (FIG. 1A). Biosynthetically, this unique structural moiety was proposed to arise from a [4+2] IMDA reaction (FIG. 1B).^(44,46) This presumed cycloaddition reaction is also believed to catalyze the first enantiodivergent step in an otherwise common biosynthetic pathway from Aspergillus sp. MF297-2 and A. versicolor NRRL35600, leading to formation of (−)-1 and (+)-1, respectively, together with several other enantiomeric metabolites (FIG. 3).⁴⁷ Currently, it remains unknown whether a specific IMDAse indeed exists in these biosynthetic pathways. However, if it does exist, one would expect its encoding gene should be present in all four gene clusters. Second, the spiro-oxindole is absent in 3, suggesting the responsible enzyme is likely absent from the pathway for 3, and present in those for 1 and 2. Third, a specific reductase responsible for reducing the tryptophan carbonyl group would be expected in the gene cluster of 2 and 3, but not 1. This genetic difference would account for the lack of the second amide carbonyl group in the piperazine ring of 2 and 3. Finally, the different hydroxylation status of the indole amide, distinct aromatic decoration among 1-3, together with other unique structural features including the tailoring of the proline moiety and N-methylation in 2, are also expected to be reflected at the genetic level.

The following examples are provided to illustrate particular embodiments of the present invention, and are not to be construed as limiting the scope of the invention.

EXAMPLE 1

The genomes of A. versicolor NRRL35600, P. fellutanum ATCC20841, and M. aurantiaca RRC1813A harboring not′, phq, and mal gene clusters, respectively were sequenced to approximately 99, 84, and 181 times coverage of their estimated genome size (35 Mb), using the Illumina Solexa technology (Genome Analyzer IIx).

First, the key biosynthetic gene notE′ (Table 1) encoding a non-ribosomal peptide synthetase (NRPS) was mined from the genome sequences using the notE DNA sequence from the reported not gene cluster⁴³ as a probe for homologous genes. NotE′, which shows 79% identity and 86% similarity to NotE at the amino acid (AA) level, was predicted to be a bimodular NRPS with the A-T-C-A-T-C (A: adenylation, T: thiolation, C: condensation) domain organization using the PKS/NRPS Analyzer. Genome walking from notE′ toward 5′ and 3′ ends identified another nine genes (notA′-J′, Table 1 and FIG. 2) that display high AA sequence similarity (>70%) with corresponding gene products of the not gene cluster. Notably, the overall nucleotide identity between notA′-J′ (25,440 bp) and notA-J (26,210 bp) is 71%, which is not surprising since both metabolic pathways are responsible for assembling “identical”, yet antipodal compounds. In addition to the high sequence similarity, the genetic architecture (i.e. order and direction of genes) within this region is identical in the two clusters (FIG. 2). The pattern of the exon/intron arrangement in the corresponding genes is also highly similar to each other (see Supplementary Information). In contrast, the sequence similarity is reduced drastically and the gene architecture differs after notK′/notK (Table 1, FIG. 2), strongly suggesting the previously assigned not gene cluster (notA-R) probably ends at notJ.

At the genetic level, it is not possible to glean the key differences that account for production of antipodal notoamide metabolites, suggesting that subtle active site sequence variation in those enantiomerically selective enzymes play a critical role in the control of absolute chirality. This requires direct biochemical analysis of the key notoamide biosynthetic enzymes, including structural biology efforts, which is currently ongoing in our laboratories.

Second, the paraherquamide (phq) gene cluster (47,884 bp) was identified from the partially assembled P. fellutanum genome by using a select group of not genes including the NRPS gene notE, the prenyltransferase genes notC and notF, and the P450 monooxygenase gene notG as in silico probes.⁴³ Fifteen genes were identified that are likely involved in paraherquamide biosynthesis. The largest number of biosynthetic genes among the four studied metabolic pathways is consistent with 2 as the most complex structure compared to 1 and 3. Comparative bioinformatic analysis demonstrates that nine (phqA, B, F, G, H, J, K, L, and M) out of fifteen total phq genes are homologous to corresponding not genes (Table 1), although their homology is significantly lower than that between not and not′ genes. Notably, the bimodular phqB NRPS gene is different from notE in that a reductase (R) domain is located at its carboxy terminus instead of a condensation (C) domain, which is found in notE and notE′. This difference is significant because the reductase (vs condensation) domain is presumed to account for the presence of the monooxopiperazine in 2 (vs dioxopiperazine in 1) (see below).⁵⁰ Among the remaining six cluster-specific genes, phqC shows high sequence similarity to 2-oxoglutarate (2OG) and Fe(II) dependent oxygenases.^(51,52) The phqD and phqE genes, which putatively encoding a pyrroline-5-carboxylate reductase and a short chain dehydrogenase, respectively, might be involved in the formation of the β-methyl-proline starter unit. The phqI gene that encodes the third prenyltransferase in phq is unique as it is free of introns, and therefore, distinct from the single intron-containing prenyltransferase genes phqA/notC and phqJ/notF. It is worth noting that the presence of three prenyltransferase genes is inconsistent with the two isoprene groups incorporated into the structure of 2. Thus, it is of special interest to examine whether the third prenyltransferase gene is redundant or plays an alternative, and as yet unknown function in the biosynthesis of 2. Furthermore, phqN is predicted to function as a methyltransferase, likely responsible for the N-methylation in 2. Finally, the phqO P450 gene with a unique exon/intron organization pattern is hypothesized to catalyze the C14 hydroxylation of the β-methyl-proline moiety.

Third, the seven-gene containing mal gene cluster (20179 bp) was mined from the genome of Malbranchea aurantiaca RRC1813A using phqB as an in silico probe to identify the metabolic system for 3. It has the smallest size among gene clusters of 1-3, which is consistent with the simplest structure and corresponding biosynthetic pathway. The genes malB, malD, malE, malF, and malG are common to the four gene clusters. Thus, except for the regulatory gene of malD (homologous to notA, notA′ and phqG), the remaining four biosynthetic genes (and their homologues in not, not′ and phq) are possibly responsible for installing the shared structural features of 1-3. This strongly suggests that the hypothetical Diels Alderase (if extant) should be represented by one of these four gene products (see below). Interestingly, the mal genes show greater sequence similarity to phq genes than not (or not′) genes, perhaps indicating their closer evolutionary relationship. Similar to PhqB, the NRPS MalG harbors a reductase domain at its carboxy terminus, which is consistent with the monooxopiperazine moiety in 3. Again, the apparent redundancy of the second prenyltransferase (3 only contains one isoprene group) is difficult to rationalize, but genetic disruption or RNA silencing (malB or malE) efforts are likely to shed light on the individual role of these enzymes. Finally, it is evident that the flavin-dependent halogenase MalA is likely involved in the introduction of one or both chlorine atoms in the biosynthesis of 3.

EXAMPLE 2

Since the discovery of the biosynthetic gene cluster of (−)-1 from marine Aspergillus sp. MF297-2, in vitro biochemical characterization of the reverse prenyltransferase NotF using the NRPS (NotE) product brevianamide F⁵³ (4) as substrate and the normal prenyltransferase NotC using 6-hydroxy-deoxybrevianamide E (6) as substrate has partially established the early steps of the notoamide pathway leading to notoamide S (7) (FIG. 3).⁴³ The P450 monooxygenase NotG is likely catalyzing the C6 indole hydroxylation since its close homologue FtmC (59%/72% identity/similarity) in fumitremorgin biosynthesis had been characterized to hydroxylate the analogous aromatic C—H bond in the indole ring of tryprostatin B,^(54,55) which is structurally similar to deoxybrevianamide E (5).⁵⁶

As the proposed pivotal branching point in notoamide biosynthesis,^(47,57,58) 7 can be diverted to notoamide E (8) through an oxidative pyran ring closure putatively catalyzed by either NotH P450 monooxygenase (based on precedented examples of pyran ring formation from the epoxide intermediate generated by P450 enzymes⁵⁹), or the NotD oxidoreductase. This step would be followed by an indole 2,3-epoxidation-initiated Pinacol-like rearrangement catalyzed by NotB FAD monooxygenase (FMO) leading to the formation of notoamide C (9) and notoamide D (10).⁵⁸ Notably, notB (or notB′) is only observed in the not (or not′) gene cluster, consistent with the fact that this branching pathway leading to natural products 9 and 10 is only observed in notoamide biosynthesis.

On the other hand, extensive precursor feeding and incorporation studies using stable isotopically labeled intermediates have supported 7 as the substrate for the hypothetical IMDA.⁴⁷ As a working hypothesis, a two-electron oxidation catalyzed by an oxidase would give rise to the achiral azadiene intermediate (11), which may immediately undergo a spontaneous stereoselective [4+2] IMDA cyclization in the active site of the same oxidase, yielding either (+)-notoamide T ((+)-12) in Aspergillus sp. MF297-2 or (−)-notoamide T ((−)-12) in A. versicolor. The opposing conformation (endo/exo) assumed by achiral 11 presumably determined by the scaffolding of each putative Diels-Alderase might account for the enantio-divergence at this key step. The five oxidases encoded by the not gene cluster, include FMO NotB and NotI, P450 enzymes NotG and NotH, and the FAD-dependent oxidoreductase NotD. NotB was recently identified as the notoamide E oxidase.⁵⁸ NotI is highly similar to NotB with 42% protein sequence identity and 59% similarity, and is predicted to catalyze a similar conversion from (+)-stephacidin A⁶⁰ ((+)-13) to (−)-notoamide B ((−)-14) via the 2,3-epoxidation of (+)-13 followed by a Pinacol-type rearrangement. Thus, if the putative function of NotG (see above) is correct, NotH (or NotD) is likely the bifunctional oxidase that also functions as the IMDAse responsible for generation of (+)-12. To generate antipodal (−)-12, NotH′ (or NotD′) is expected to catalyze a Diels Alder reaction leading to the opposite stereochemistry. Currently, this hypothesis is being tested in our laboratories through in vitro characterization of NotH/NotH′ (or NotD/NotD′). With comparative analysis of four gene clusters (Table 1), it appears that NotD/NotD′ is more likely to serve as the IMDAse since its homologs (PhqH and MalF) are present in all clusters. This hypothesis is based on the assumption that these four biosynthetic pathways use the same type of protein scaffolding enzyme to catayze the [4+2] cyclo addition. However, we have recently begun to challenge this assumption (see below). Presently, the possibility that NotH/NotH′ functions as the IMDAse in notoamide biosynthesis cannot be excluded. Once its identity is determined, the final oxidase NotD (or NotH) will likely be found to catalyze the oxidative pyran ring formation (FIG. 3).

Another important fact of these two related notoamide pathways is that enzymes catalyzing the biosynthetic steps after formation of 12 must also be enantiomerically and diastereochemically selective. Specifically, in previous precursor incorporation studies of racemic ¹³C-labeled (±)-13 with Aspergillus sp. MF297-2 and A. versicolor, ⁶¹ it was ascertained that only one enantiomer of 13 can be processed (currently presumed by NotI and NotI') to form downstream products. Understanding the subtle differences between these two enzymes will likely provide significant insights into how related enzymes have evolved to adopt opposing enantiomeric selectivity.

Finally, it remains unclear which enzyme could be responsible for the final hydroxylation steps leading to notoamide A (1) and sclerotiamide⁶² (15) since all five oxidative enzymes in the not(′) gene cluster has been assigned a putative function. It is possible that 1 and 15 are opportunistically produced upon the activity of unknown oxidases whose genes reside outside of the defined notoamide gene cluster. Alternatively, the possibility that a not oxidase may possess bi-functionality cannot be excluded.

EXAMPLE 3

Previous feeding studies demonstrated that L-isoleucine is the precursor to the β-methyl-β-hydroxy proline moiety in 2.^(45,63) Identification of the pyrroline-5-carboxylate reductase PhqD and the short chain dehydrogenase PhqE from phq cluster suggests a reasonable pathway from L-isoleucine to β-methyl proline (FIG. 4). Similar to the partially identified biosynthesis of 4-methyl proline in cyanobacterial Nostoc sp.,⁶⁴ PhqE presumably oxidizes the terminally hydroxylated L-isoleucine (by an unknown enzyme) to the corresponding aldehyde. Spontaneous cyclization and dehydration would yield the 4-methyl pyrolline-5-carboxylic acid, which is then reduced by PhqD leading to the β-methyl proline precursor.

The presence of a C-terminal NAD(P)-dependent reductase domain in the bimodular paraherquamide NRPS (A-T-C-A-T-R) clearly indicates that the mechanism for dipeptide release by PhqB must be different from the final condensation domain of NotE (FIG. 3).⁵⁰ What likely occurs is that the PhqB R domain utilizes NADPH for hydride transfer to reduce the thioester bond of the T domain-tethered linear dipeptide to a hemithioaminal intermediate, which spontaneously cleaves the C—S bond to release the aldehyde product. Subsequently, the acid-activated aldehyde is intramolecularly trapped by the nucleophilic amine from the adjacent amino acid to form a hemiaminal intermediate, which then undergoes a spontaneous dehydration and double bond rearrangement leading to formation of the monooxopiperazine intermediate 16 (likely existing as the enol form) prior to all other biosynthetic steps. This hypothesis is in good agreement with previous observations^(65,66) that the dioxopiperazine analog of preparaherquamide (17) cannot be incorporated into 2 by P. fellutanum since all substrates for downstream enzymes should bear the monooxopiperazine ring system. In this scheme (FIG. 4), formation of the diene in 16 is achieved by a reductive process, as opposed to the 2e⁻ oxidation step proposed in the notoamide biosynthetic pathway (FIG. 3). If this is correct, in contrast to an oxidase (NotH/NotH′ or NotD/NotD′) proposed to be the Diels Alderase in notoamide biosynthesis, the reverse prenyltransferase (proposed to be PhqJ) might act as the scaffold for an IMDA reaction after introduction of the reverse prenyl group to 16. In this proposed route, the terminal double bond of the isoprene group would become the dienophile to react with the azadiene in the prenyltransferase active site, thus resulting in formation of the [2.2.2]diazaoctane intermediate 17.

Following formation of 17, the pyran ring formation is proposed to be installed by PhqA prenyltransferase (22% identical to NotC), PhqL (29% identical to NotG) and PhqH oxidoreductase (34% identical to NotD) (or PhqM P450 enzymes (15% identical to NotH)). The FMO PhqK (32% identical to NotI) is likely responsible for generation of the spiro-oxindole, and the N-methylation is likely mediated by the PhqN methyltransferase leading to the isolable natural product paraherquamide F^(38,67) (18). However, the order of these biosynthetic steps cannot be predicted without further in vivo genetic studies and/or in vitro biochemical analysis.

In late-stage paraherquamide biosynthesis, the third P450 monooxygenase PhqO is probably responsible for the C14 hydroxylation, transforming 18 to paraherquamide G^(38,67) (19), and paraherquamide E^(38,67) (20) to the final product 2. However, expansion from the 6-membered ring pyran (in 18 and 19) to the 7-membered dioxepin ring (in 2 and 20) represents a poorly understood but intriguing process. Possibly, phqC that encodes a 2OG-Fe(II)-oxygenase is involved in this ring expansion, which is consistent with previous reports showing this class of enzyme functioning as an expandase.⁶⁸

Finally, the biosynthetic genes, including phqI as well as phqM (or phqH, the one uninvolved in the pyran ring formation), do not have a clearly prescribed role and appear to be redundant.

EXAMPLE 4

Except for using L-proline instead of β-methyl proline as the starter unit, the biosynthetic route through premalbrancheamide (21) (FIG. 5) is proposed to parallel that of paraherquamide biosynthesis through 17 (FIG. 4). Mediated by NRPS MalG (A-T-C-A-T-R, 37% identical to PhqB) and prenyltransferase MalE (36%/34% identical to NotF/PhqJ), 21 is produced with its structure slightly different from 17 in lacking the C1 methyl group.

Subsequently, the halogenase MalA presumably chlorinates the C9 position (malbrancheamide numbering) first to afford the isolable natural product malbrancheamide B (22), which could be further chlorinated by MalA at C8 leading to the final product malbrancheamide (3). This putative pathway is partially supported by the previous feeding study showing that the ¹³C labeled 21 can be incorporated into 22 by M. aurantiaca. ⁶⁹ Lack of observed ¹³C labeled 3 from the fermentation broth was interpreted to suggest that the second chlorination might be too slow to incorporate detectable levels of ¹³C material from 22 to 3. Notably, the order of these two chlorinations seems unexchangeable since the C8-monochloro regioisomer of 22 (C9-monochlorinated) was not detected as a natural product despite considerable effort.⁴² It is also possible that the dichloro species, malbrancheamide, arises from a pre-halogenated tryptophan-based assembly.

Blast sequence analysis revealed significant homology of MalA to the family of flavin-dependent tryptophan halogenases.⁷⁰⁻⁷³ This result suggests two alternative malbrancheamide biosynthetic pathways. First, MalA could chlorinate tryptophan at C4 and C5 (tryptophan numbering) sequentially prior to being loaded onto the second T domain of MalG. Then, both monochlorinated and dichlorinated tryptophan could be processed by subsequent assembly enzymes, thereby respectively leading to 22 and 3 in parallel. Second, MalA might only monochlorinate the C4 position of tryptophan, resulting in 22. Then, 22 is converted into 3 by either MalA or another unidentified halogenase that resides outside mal. To test these hypotheses, it would be the best to conduct in vitro functional analysis of purified MalA against selected substrates such as L-tryptophan and 22. Alternatively, whether or not the ¹³C labeled 22 can be incorporated into 3 in an in vivo precursor feeding study would also provide useful information about the timing of the two chlorination steps in malbrancheamide biosynthesis.

According to the proposed malbrancheamide biosynthetic pathway (FIG. 5), only three enzymes are required to assemble the final product 3. Inactivation of these seemingly redundant genes including malB, malC, and malF (Table 1) is currently underway. Interestingly, the MalC short chain dehydrogenase related to PhqE, which is presumed to participate in preparation of β-methyl proline starter unit in paraherquamide biosynthesis (see above), is present in the mal gene cluster although apparently unnecessary for malbrancheamide biosynthesis. This implies that malC, together with other redundant genes, might be residuals from ancestral or a horizontally transferred gene cluster (e.g. one analogous to phq). The evolving biosynthetic gene cluster not only recruits new genes, but also eliminates or retains unused genes when facing a diverse living environment and selection pressure during its evolutionary history.²⁴

Recently, a novel malbrancheamide-type natural product named spiromalbramide (23) (FIG. 5) was isolated from a marine invertebrate-derived Malbranchea graminicola fungal strain.⁷⁴ This new derivative contains the spiro-oxindole moiety that is found in notoamides and paraherquamides, but is absent from malbrancheamides. Based on the comparative analysis of not, not′, phq, and mal gene cluster, we are now capable of predicting that an FMO gene homologous to notI, notI′ or phqK should reside in the uncharacterized biosynthetic gene cluster of 23. So far, the Solexa genome sequencing of M. graminicola has been completed. This prediction will be tested in the near future as soon as the biosynthetic gene cluster is mined and annotated from genome sequences.

EXAMPLE 5

In principle, the shared genes from different clusters are responsible for assembling the common structural core among similar natural products. The cluster-specific gene products are presumed to modify these structures by a series of variant tailoring steps, thereby leading to structural diversification. However, it is noteworthy that the redundant genes and multifunctional genes could complicate comparative analysis of gene clusters. Therefore, conclusions can only be unambiguously drawn after genetic and/or biochemical confirmation of enzymatic activities.

Following these simple but logical principles, we performed a comparative analysis wasperformed for four related gene clusters including not, not′, phq, and mal, based on the proposed complete biosynthetic pathways for (+)/(−)-notoamides, paraherquamides, and malbrancheamides with a biosynthetic enzyme assigned for each individual step (FIG. 3-5). For example, the function of the not-specific gene notB can be readily connected to the pathway specific transformation from notoamide E (8) to notoamide C (9) and D (10). This was recently confirmed by in vitro characterization of NotB FMO enzyme.⁵⁸

Furthermore, detailed comparative analysis resulted in nomination of the oxidases NotH and NotH′ (or NotD and NotD′), and the prenyltransferases PhqJ and MalE as putative Diels-Alderases to catalyze the distinctive IMDA reactions for these pathways. Next, comparative functional analysis of these enzymes in vitro will enable us to test this long standing hypothesis regarding the existence of a Diels-Alderase in the biosynthesis of fungal indole alkaloids with the bicyclo[2.2.2]diazaoctane core. It is striking that Nature has conscripted two evolutionarily related gene cluster paradigms, to construct the novel bicyclo[2.2.2]diazaoctane ring system by vastly different mechanistic protocols (FIG. 6). In one instance, for the notoamides, the net transformation from the NRPS-loaded dipeptide to the bicyclo[2.2.2]diazaoctane core, a net two-electron oxidation is required to reach the key, putative azadiene species required for the proposed IMDA construction. In the other, the paraherquamide and malbrancheamide systems, the NRPS-loaded dipeptide substrate is cleaved in a net two-electron reduction, that we speculate cyclizes and dehydrates to the related (reduced) azadiene species for the homologous IMDA construction. This insight was most readily presented to us, by the analysis of the respective gene cluster annotations, and has provided a very satisfying level of corroboration with labeled precursor incorporation experiments that at first, seemed incongruous. We expect that the tremendous insights that the bioinformatics analyses have provided in these systems, will render understanding the possible biogenesis of these and related natural products more efficient, congruent and intellectually satisfying.

The foregoing description and examples have been set forth merely to illustrate the invention and are not intended to be limiting. Since modifications of the described embodiments incorporating the spirit and substance of the invention may occur to persons skilled in the art, the invention should be construed broadly to include all variations within the scope of the appended claims and equivalents thereof.

REFERENCES

-   1. J. W.-H. Li and J. C. Vederas, Science, 2009, 325, 161. -   2. D. J. Newman and G. M. Cragg, J. Nat. Prod., 2007, 70, 461. -   3. R. Li and C. A. Townsend, Metab. Eng., 2006, 8, 240. -   4. R. H. Baltz, J. Ind. Microbiol. Biotechnol., 1998, 20, 360. -   5. S. Baba, Y. Abe, T. Suzuki, C. Ono, K. Iwamoto, T. Nihira and M.     Hosobuchi, Appl. -   Microbiol. Biotechnol., 2009, 83, 697. -   6. J.-H. Noh, S.-H. Kim, H.-N. Lee, S. Y. Lee and E.-S. Kim, Appl.     Microbiol. Biotechnol., 2010, 86, 1145. -   7. W. R. Strohl, Metab. Eng., 2001, 3, 4. -   8. D. E. Cane, C. T. Walsh and C. Khosla, Science, 1998, 282, 63. -   9. C. T. Walsh, ChemBioChem, 2002, 3, 125. -   10. C. Sanchez, L. Zhu, A. F. Brana, A. P. Salas, J. Rohr, C. Mendez     and J. A. Salas, Proc. Natl. Acad. Sci. U.S.A., 2005, 102, 461. -   11. J. Pollier, T. Moses and A. Goossens, Nat. Prod. Rep., 2011, 28,     1897. -   12. J. L. Que and W. B. Tolman, Nature, 2008, 455, 333. -   13. A. L. Goff, V. Artero, B. Jousselme, P. D. Tran, N. Guillet, R.     Métayé, A. Fihri, S. Palacin and M. Fontecave, Science, 2009, 326,     1384. -   14. K. T. Watts, B. N. Mijts and C. Schmidt-Dannert, Adv. Synth.     Catal., 2005, 347, 927. -   15. Y. Xue, L. Zhao, H.-w. Liu and D. H. Sherman, Proc. Natl. Acad.     Sci. U.S.A., 1998, 95, 12111. -   16. J. C. Carlson, J. L. Fortman, Y. Anzai, S. Li, D. A. Burr     and D. H. Sherman, ChemBioChem, 2010, 11, 564. -   17. J. D. Kittendorf and D. H. Sherman, Bioorg. Med. Chem., 2009,     17, 2137. -   18. J. C. Carlson, S. Li, S. S. Gunatilleke, Y. Anzai, D. A.     Burr, L. M. Podust and D. H. Sherman, Nat. Chem., 2011, 3, 628. -   19. H. Zhang, B. A. Boghigian, J. Armando and B. A. Pfeifer, Nat.     Prod. Rep., 2011, 28, 125. -   20. U. Galm and B. Shen, Expert Opin. Drug. Discov., 2006, 1, 409. -   21. L. Tang, S. Shah, L. Chung, J. Carney, L. Katz, C. Khosla and B.     Julien, Science, 2000, 287, 640. -   22. Q. Cheng, L. Xiang, M. Izumikawa, D. Meluzzi and B. S. Moore,     Nat. Chem. Biol., 2007, 3, 557. -   23. C. J. Balibar, A. R. Howard-Jones and C. T. Walsh, Nat. Chem.     Biol., 2007, 3, 584. -   24. L. Gu, B. Wang, A. Kulkarni, T. W. Geders, R. V. Grindberg, L.     Gerwick, K. Hakansson, P. Wipf, J. L. Smith, W. H. Gerwick and D. H.     Sherman, Nature, 2009, 459, 731. -   25. Y. Anzai, S. Li, M. R. Chaulagain, K. Kinoshita, J. Montgomery     and D. H. Sherman, Chem. Biol., 2008, 15, 950. -   26. U. Galm, E. Wendt-Pienkowski, L. Wang, S.-X. Huang, C. Unsin, M.     Tao, J. M. Coughlin and B. Shen, J. Nat. Prod., 2011, 74, 526. -   27. B. Peant, G. LaPointe, C. Gilbert, D. Atlan, P. Ward and D. Roy,     Microbiology, 2005, 151, 1839. -   28. K. S. Ryan, PLoS One, 2011, 6, e23694. -   29. K. Buntin, H. Irschik, K. J. Weissman, E. Luxenburger, H.     Blöcher and R. Müller, Chem. Biol., 2010, 17, 342. -   30. R. D. Hawkins, G. C. Hon and B. Ren, Nat. Rev. Genet., 2010, 11,     476. -   31. M. L. Metzker, Nat. Rev. Genet., 2010, 11, 31. -   32. T. J. Treangen and S. L. Salzberg, Nat. Rev. Genet., 2012, 13,     36. -   33. C. Shaffer, Nat. Biotechnol., 2007, 25, 149. -   34. S. C. Schuster, Nat. Methods, 2008, 5, 16. -   35. H. Kato, T. Yoshida, T. Tokue, Y. Nojiri, H. Hirota, T.     Ohta, R. M. Williams and S. Tsukamoto, Angew. Chem. Intl. Ed., 2007,     46, 2254. -   36. T. J. Greshock, A. W. Grubbs, P. Jiao, D. T. Wicklow, J. B.     Gloer and R. M. Williams, Angew. Chem. Intl. Ed., 2008, 47, 3573. -   37. M. Yamazaki, E. Okuyama, M. Kobayashi and H. Inoue, Tetrahedron     Lett., 1981, 22, 135. -   38. J. G. Ondeyka, R. T. Goegelman, J. M. Schaeffer, L. Kelemen     and L. Zitano, J. Antibiot., 1990, 43, 1375. -   39. R. M. Williams, J. Gao, H. Tsujishima and R. J. Cox, J. Am.     Chem. Soc., 2003, 125, 12172. -   40. S. Martinez-Luis, R. Rodriguez, L. Acevedo, M. C. Gonzalez, A.     Lira-Rocha and R. Mata, Tetrahedron, 2006, 62, 1817. -   41. M. Figueroa, M. C. Gonzalez and R. Mata, Nat. Prod. Res., 2008,     22, 709. -   42. K. A. Miller, T. R. Welch, T. J. Greshock, Y. Ding, D. H.     Sherman and R. M. Williams, J. Org. Chem., 2008, 73, 3116. -   43. Y. Ding, J. R. deWet, J. Cavalcoli, S. Li, T. J. Greshock, K. A.     Miller, J. M. Finefield, J. D. Sunderhaus, T. J. McAfoos, S.     Tsukamoto, R. M. Williams and D. H. Sherman, J. Am. Chem. Soc.,     2010, 132, 12733. -   44. R. M. Williams and R. J. Cox, Acc. Chem. Res., 2003, 36, 127. -   45. E. M. Stocking, J. F. Sanz-Cervera, C. J. Unkefer and R. M.     Williams, Tetrahedron, 2001, 57, 5303. -   46. E. M. Stocking and R. M. Williams, Angew. Chem. Intl. Ed., 2003,     42, 3078. -   47. J. D. Sunderhaus, D. H. Sherman and R. M. Williams, Isr. J.     Chem., 2011, 51, 442. -   48. A. W. Grubbs, G. D. I. Artman, S. Tsukamoto and R. M. Williams,     Angew. Chem. Intl. Ed., 2007, 46, 2257. -   49. T. J. Greshock, A. W. Grubbs, S. Tsukamoto and R. M. Williams,     Angew. Chem. Intl. Ed., 2007, 46, 2262. -   50. T. A. Keating, D. E. Ehmann, R. M. Kohli, C. G. Marshall, J. W.     Trauger and C. T. Walsh, ChemBioChem, 2001, 2, 99. -   51. N. Steffan, A. Grundmann, S. Afiyatullov, H. Ruan and S.-M. Li,     Org. Biomol. Chem., 2009, 7, 4082. -   52. R. P. Hausinger, Crit. Rev. Biochem. Mol. Biol., 2004, 39, 21. -   53. A. J. Birch and J. J. Wright, J. Chem. Soc. Chem. Commun., 1969,     644. -   54. S.-M. Li, J. Antibiot., 2011, 64, 45. -   55. N. Kato, H. Suzuki, H. Takagi, H. Kakeya, M. Uramoto, T.     Usui, S. Takahashi, Y. Sugimoto and H. Osada, ChemBioChem, 2009, 10,     920. -   56. P. S. Steyn, Tetrahedron Lett., 1971, 12, 3331. -   57. S. Tsukamoto, H. Kato, T. J. Greshock, H. Hirota, T. Ohta     and R. M. Williams, J. Am. Chem. Soc., 2009, 131, 3834. -   58. S. Li, J. M. Finefield, J. D. Sunderhaus, T. J. McAfoos, R. M.     Williams and D. H. Sherman, J. Am. Chem. Soc., 2012, 134, 788. -   59. M. Oliynyk, C. B. W. Stark, A. Bhatt, M. A. Jones, Z. A.     Hugher-Thomas, C. -   Wilkinson, Z. Oliynyk, Y. Demydchuk, J. Staunton and P. F. Leadlay,     Mol. Microbiol., 2003, 49, 1179. -   60. J. Qian-Cutrone, S. Huang, Y. Z. Shu, D. Vyas, C. Fairchild, A.     Menendez, K. Krappitz, R. Dalterio, S. E. Klohr and Q. Gao, J. Am.     Chem. Soc., 2002, 124, 14556. -   61. J. M. Finefield, H. Kato, T. J. Greshock, D. H. Sherman, S.     Tsukamoto and R. M. Williams, Org. Lett., 2011, 13, 3802. -   62. C. Authrine and J. B. Gloer, J. Nat. Prod., 1996, 59, 1093. -   63. E. M. Stocking, J. F. Sanz-Cervera and R. M. Williams, J. Am.     Chem. Soc., 2000, 122, 1675. -   64. H. Luesch, D. Hoffmann, J. M. Hevel, J. E. Becker, T. Golakoti     and R. E. Moore, J. Org. Chem., 2002, 68, 83. -   65. Y. Ding, S. Gruschow, T. J. Greshock, J. M. Finefield, D. H.     Sherman and R. M. Williams, J. Nat. Prod., 2008, 71, 1574. -   66. E. M. Stocking, J. F. Sanz-Cervera and R. M. Williams, Angew.     Chem. Intl. Ed., 2001, 40, 1296. -   67. J. M. Liesch and C. F. Wichmann, J. Antibiot., 1990, 43, 1380. -   68. K. S. Hewitson, N. Granatino, R. W. D. Welford, M. A. McDonough     and C. J. Schofield, Phil. Trans. R. Soc. A, 2005, 363, 807. -   69. Y. Ding, T. J. Greshock, K. A. Miller, D. H. Sherman and R. M.     Williams, Org. Lett., 2008, 10, 4863. -   70. K. H. vanPee and E. P. Patallo, Appl. Microbiol. Biotechnol.,     2006, 70, 631. -   71. J. Zeng and J. Zhan, ChemBioChem, 2010, 11, 2119. -   72. C. S, Neumann, C. T. Walsh and R. R. Kay, Proc. Natl. Acad. Sci.     U.S.A., 2010, 107, 5798. -   73. C. Dong, S. Flecks, S. Unversucht, C. Haupt, K. H. vanPee     and J. H. Naismith, Science, 2005, 309, 2216. -   74. K. R. Watts, S. T. Loveridge, K. Tenney, J. Media, F. A.     Valeriote and P. Crews, J. Org. Chem., 2011, 76, 6201.

TABLE 1 Comparative analysis* of gene clusters of not, not′, phq, and mal Function Function Function Not Not′ (% identity to Phq (% identity to Mal (% identity to proteins proteins corresponding proteins corresponding proteins corresponding (AA) Function (AA) Not protein) (AA) Not protein) (AA) Not/Phq protein) NotA Negative regulator NotA′ Negative regulator PhqA Prenyltransferase MalA Halogenase (—/—) (339) (334) (70% NotA) (405) (22% NotC) (667) NotB FAD NotB′ FAD PhqB NRPS [A-T-C-A- MalB Prenyltransferase (456) monooxygenase (455) monooxygenase (2449) T-R] (369) (28% NotC/34% (88% NotB) (26% NotE) PhqA) NotC Prenyltransferase NotC′ Prenyltransferase PhqC 2OG-Fe(II)- MalC Short chain (427) (426) (87% NotC) (353) oxygenase (—) (264) dehydrogenase (—/52% PhqE) NotD Oxidoreductase NotD′ Oxidoreductase PhqD Pyrroline-5- MalD Negative (621) (612) (80% NotD) (322) carboxylate (336) regulator (36% reductase (—) NotA/55% PhqG) NotE NRPS [A-T-C-A- NotE′ NRPS [A-T-C-A- PhqE Short chain MalE Prenyltransferase (2241) T-C] (2225) T-C] (265) dehydrogenase (—) (438) (36% NotF/34% (79% NotE) PhqJ) NotF Prenyltransferase NotF′ Prenyltransferase PhqF Efflux pump (18% MalF Oxidoreductase (453) (435) (79% NotF) (411) NotK) (590) (37% NotD/39% PhqH) NotG P450 NotG′ P450 PhqG Negative regulator MalG NRPS [A-T-C-A- (544) monooxygenase (544) monooxygenase (338) (34% NotA) (2345) T-R] (27% (87% NotG) NotE/37% PhqB) NotH P450 NotH′ P450 PhqH Oxidoreductase (502) monooxygenase (499) monooxygenase (602) (34% NotD) (84% NotH) NotI FAD NotI′ FAD PhqI Prenyltransferase (434) monooxygenase (433) monooxygenase (462) (—) (85% NotI) NotJ Unknown NotJ′ Unknown (80% PhqJ Prenyltransferase (371) (362) NotJ) (406) (32% NotF) NotK Efflux pump NotK′ Efflux pump (14% PhqK FAD (564) (577) NotK) (459) monooxygenase (32% NotI) NotL Transcriptional NotL′ Transcriptional PhqL P450 (484) activator (620) factor (15% NotL) (563) monooxygenase (29% NotG) NotM Unknown NotM′ Unknown (—) PhqM P450 (402) (454) (536) monooxygenase (15% NotH) NotN Dehydrogenase NotN′ Unknown (—) PhqN Methyltransferase (340) (416) (326) NotO Short-chain NotO′ Unknown (—) PhqO P450 (331) dehydrogenase (462) (451) monooxygenase (—) NotP Unknown NotP′ Unknown (—) (322) (292) NotQ Unknown NotQ′ Transcription (152) (506) factor NotR Transcriptional NotR′ Unknown (461) coactivator (172) *Genes were predicted using the FGENESH-M tool. Functions of gene products were predicted using BLAST search. 

What is claimed is:
 1. A protein in the prenylated indole alkaloid pathway, wherein the protein is a MalG protein having an amino acid sequence that is 98% or more identical to SEQ. ID NO: 15, comprises at least one amino acid substitution, insertion or deletion relative to SEQ ID NO: 15 and has MalG activity.
 2. The MalG protein of claim 1 further comprising a chlorinated tryptophan loaded onto the second thiolation (T) domain of the protein.
 3. A polynucleotide encoding a protein in the prenylated indole alkaloid pathway, wherein the polynucleotide encodes a MalG protein having an amino acid sequence that is 98% or more identical to SEQ ID NO: 15, comprises at least one amino acid substitution, insertion or deletion relative to SEQ ID NO:15 and has MalG activity.
 4. A host cell transformed with the polynucleotide of claim
 3. 5. An expression vector comprising the polynucleotide of claim
 3. 6. A host cell transformed with the expression vector of claim
 5. 7. A method for producing prenylated indole alkaloid or a metabolic intermediate for producing a prenylated indole alkaloid comprising the step of growing a host cell comprising the polynucleotide of claim 3 under conditions to express the protein and producing a prenylated indole alkaloid or the metabolic intermediate for producing a prenylated indole alkaloid.
 8. The method of claim 7 further comprising the step of isolating the prenylated indole alkaloid or the metabolic intermediate of the prenylated indole alkaloid.
 9. The method of claim 8 wherein the host cell is a prokaryote.
 10. The method of claim 9 wherein the host cell is selected from the group consisting of Escherichia coli, Streptomyces lavendulae, Myxococcus xanthus, and Pseudomonas fluorescens. 